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Abstract 

This report presents a description of the work done under the NASA 
Grant NAG 9-361. It describes techniques for positional estimation of a 
mobile robot navigating in an outdoor environment. A comprehensive re- 
view of the various positional estimation techniques studied in the literature 
is first presented. The techniques are divided into four different types and 
each of them is discussed briefly. In this work two different kinds of envi- 
ronments are considered for positional estimation; a mountainous natural 
terrain and an urban, man-made environment with polyhedral buildings. 
In both these cases the robot is assumed to be equipped with a single visual 
camera that can be panned and tilted and also a 3-D description (world 
model) of the environment is given. This world model is in the form of 
Digital Elevation Map (DEM) for the natural terrain case. For the urban 
environment, it is assumed that the 3-D descriptions of the roof tops of the 
buildings axe given. Such a description could be obtained from a stereo pair 
of aerial images or from the architectural plans of the buildings. Techniques 
for positional estimation using the camera input and the world model are 
presented. 
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1 Introduction 


The problem of positional estimation has received considerable attention, and many techniques 
have been proposed for solving it. Most mobile robots are equipped with wheel encoders that give 
an estimate of the robot’s position at every instant. However, the information from these encoders 
is differential, and, due to wheel slippage and quantization effects, these estimates of the robot’s 
position contain small errors. However the errors build up quickly as the robot moves, and the 
position estimate becomes increasingly uncertain. So, most mobile robots use some other form 
of sensing, like vision or range, to sense the environment and to aid in the position estimation 
process. 

Various techniques have been studied for estimating the position and pose of an autonomous 
mobile robot using different kinds of sensors. The techniques vary depending on the kind of 
environment in which the robot navigates, the known conditions of the environment, and the 
type of sensors with which the robot is equipped. Broadly the position estimation techniques can 
be classified into the following four types: 1) landmark-based methods 2) methods using trajectory 
integration and dead reckoning 3) methods using a standard reference pattern and 4) methods 
using the a priori knowledge of a world model and matching sensor data with the world model 
for position estimation. Section 2 describes all these different techniques briefly. More details are 
given in [57] 

The present work falls into the fourth category i.e., the robot is aided in its navigational 
tasks by providing a priori information about the environment in the form of a preloaded world 
model. The basic idea is to sense the environment using on board sensors on the robot and then 
to try to match these sensory observations to the preloaded world model. This process yields an 
estimate of the robot’s position and pose with a reduced uncertainty and then allows the robot to 
perform other navigational tasks. The problem in such an approach is that the sensor readings 
and the world model may be in different forms. For instance, given a CAD model of the building 
and a visual camera, the problem is to match the 3-D descriptions in the CAD model to the 2-D 
visual images. 

In this work techniques are presented for estimating the position of a mobile robot in an 
outdoor environment. Two kinds of environments are considered; a mountainous natural terrain 
and an urban man-made environment consisting of polyhedral buildings. 

In the former case a Digital Elevation Map (DEM) of the area in which the robot is to 
navigate is assumed to be given. The robot is also assumed to be equipped with a camera that 
can be panned and tilted, and a device to measure the robot’s elevation above the datum. No 
recognizable landmarks are assumed to be present in the environment where the robot is to 
navigate. The DEM is a 3-D database. It records the terrain elevations for ground positions at 
regularly spaced intervals. The images recorded by the camera are 2-D intensity images. The 
problem is to find common features to match the 2-D images to the 3-D DEM. The presented 
solution makes use of the DEM information and structures the problem as a heuristic search in 
the DEM for the possible robot location. The shape and position of the horizon line in the image 
plane and the known camera geometry of the perspective projection are used as parameters to 
search the DEM. Heuristics drawn from geometric constraints are also used to reduce the search 


1 


space [54, 55]. This is described in section 3. 

The case of mobile robot navigating in a structured, man-made, urban environment consisting 
of polyhedral buildings is considered in section 4. The 3-D descriptions of the roof tops of the 
buildings is assumed to be given. Such a description may be obtained from a pair of stereo aerial 
images or from the architectural plans of the buildings. The robot use6 the camera to image the 
surroundings, each time adjusting the tilt angle so that the roof top edges are dearly visible in the 
image plane. Now if a correspondence is established between the 3-D descriptions of these edges 
and their images the position and pose of the robot can be estimated. However establishing this 
correspondence is in general not a trivial problem. To alleviate this problem, it is proposed to use 
the geometric relations between the 3-D descriptions of the roof top edges (model edges) to prune 
the list of possible correspondences. The viewing plane (the plane in which the robot navigates) 
is divided into distinct, non-overlapping regions called the Edge Visibility Regions (EVRs). These 
EVRs essentially capture the geometric relations between the model edges with regard to their 
visibility from various regions in the viewing plane. Assodated with each EVR is a Visibility List 
(VL) which is a list of the model edges that are visible in that EVR; also stored for each edge in 
the VL of an EVR is the range of orientation angles of the robot for which the edge is visible in 
this EVR. In this research methods for generating the EVRs given a world model are discussed. 
An upperbound on the maximum number of EVRs that would be generated for a given model is 
derived [56]. Also the use of the EVRs and their associated VLs in estimating the position and 
pose of an autonomous mobile robot is discussed [58]. The use of the EVR representation of the 
environment of a robot for other navigational tasks like path-planning are outlined. 
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2 Review of positional estimation techniques 


In this section, various methods and techniques studied for estimating the position and pose 
of an autonomous mobile robot are reviewed. The techniques vary depending on the kind of 
environment in which the robot navigates, the known conditions of the environment, and the 
type of sensors with which the robot is equipped. Broadly the position estimation techniques 
can be classified into the following four types: 1) landmark-based methods 2) methods using 
trajectory integration and dead reckoning 3) methods using a standard reference pattern and 4) 
methods using the a priori knowledge of a world model and matching sensor data with the world 
model for position estimation. 


2.1 Landmark based methods 

Using landmarks for position estimation is a popular approach. The robot uses the knowledge of 
its approximate location to locate the landmarks in the environment. Once these landmarks are 
identified and the range/attitude of these relative to the robot is measured, in general, the position 
and pose of the robot can be triangulated from these measurements with a reduced uncertainty. 
The landmarks could be naturally occurring in an outdoor environment ^ike the tops of buildings, 
roof edges, hill tops, etc., or could be identifiable beacons placed at known positions to structure 
the environment. One of the basic requirements of the landmark-based methods is that the robot 
should have the ability to identify and locate the landmarks, which is, in general not an easy task. 
The position estimation methods based on landmarks vary significantly, depending on the sensors 
used: range or visual sensors; the type of landmarks, i.e., whether they are point sources or lines 
etc.; the number of landmarks needed. 

Case [7] summarizes the landmark-based techniques for position estimation and presents 
new' method, called running fix method, for position estimation. Clare D. McGillem et al [39] 
also describe an infra-red location system for navigating autonomous vehicles. They present an 
efficient method for position estimation using three infra-red beacons to structure the environment 
and an optical scanner on the robot capable of measuring the angles between a pair of beacons. 
Nasr and Bhanu [45] present a new approach to landmark recognition based on the perception, 
reasoning, and expectation (PREACTE) paradigm for the navigation of an autonomous mobile 
robot. 

Sugihara [52] presents algorithms for the position estimation of a mobile robot equipped 
with a single visual camera. He considers the problem of a robot given a map of a room where 
it is to navigate. From the images taken by the robot, with the optical axis parallel to the floor, 
only the vertical edges are extracted. On the map the points from where the vertical edges can 
arise are assumed to be given. Sugihara then considers two classes of problems. In the first 
class, all vertical edges are identical, and he searches for the point where the image is taken by 
establishing a correspondence between the vertical edges in the images and those on the map. In 
the second class of problems, the vertical edges are not distinguishable from each other and the 
exact directions in which the edges are seen are not given; only the order in which they are found 
in the image is given. The problems are considered mainly from a computational complexity point 
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of view. 


Photogrammetry generally deals with the mathematical representation of the geometrical 
relations between physical objects in three-dimensional space based on their images recorded on a 
two dimensional medium. One of the problems of photogrammetry i6 to determine the location of 
an airborne camera from which a photograph was taken by measuring the positions of a number 
of known ground objects or landmarks on the photograph. This problem is sometimes known as 
the comers calibration problem . The orientation and position of the camera in the object space 
are traditionally called the camera’s exterior orientation parameters as opposed to its interior 
orientation parameters , which are independent of the co-ordinate system of the ground objects. 
The interior orientation parameters include such elements as the camera’s effective focal length, 
len6 distortion, decentering, image plane scaling, and optical axis orientation. 

The problem of estimating the position and pose of an autonomous mobile robot given a single 
visual camera is, in essence, similar to this camera exterior orientation problem in photogram- 
metry. However, since the robot is ground-based and has position encoders and other sensors 
on it, these can be used to constrain the possible orientation and pose. In general, the exterior 
camera orientation problem involves solving for six degrees of freedom, three rotational and three 
translational. Traditionally, in single camera photogrammetry, by observing the object’s feature 
points on the image, it is possible to solve the exterior orientation calibration problem using a 
traditional method known as space resection (El Hassan [15]).The method is based on perspec- 
tive geometry of a simplified camera model, derived from a pinhole optics, in which the image of 
each feature point is projected onto the image plane by a ray connecting the feature point with 
the pinhole lens. This collinearity condition results mathematically in two nonlinear equations 
for each feature point. Hence at least three non-collinear points are required to solve for the six 
degrees of freedom. These collinearity equations are linearized and solved in an iterative fashion. 
When the images are noisy, more than three points can be used, with least squares criteria, to 
take advantage of data smoothing. These methods axe now standard in the photogrammetry 
literature (Wolf [64]). Iterative solutions are generally more computationally demanding, so that 
simplifying assumptions are usually necessary for real-time applications. Over the years, a num- 
ber of alternate methods have been proposed in an effort to improve the efficiency of the camera 
calibration procedure. 

Fischler and Bolles [18] studied the exterior calibration problem in connection with the 
concept of random sample consensus (RANSAC), a methodology proposed for processing large 
data 6ets with gross errors or outliers. Ganapathy [20] presents a noniterative, analytic technique 
for recovering the six exterior orientation parameters as well as four of the interior orientation 
parameters, 2 for scaling and 2 for the location of the origin in the image plane. He essentially 
decomposes the given transformation into the various camera parameters that constitute the 
components of the matrix. Roger Tsai [60] presents a two stage technique for the calibration of 
both the exterior and interior parameters of the camera. It is probably the most complete camera 
calibration method proposed so far. Liu, Huang, and Faugeras [35] present a new method 
for determining the camera location using straight line correspondences. They show that the 
rotation matrix and the translation vector can be solved for separately. Both linear and nonlinear 
algorithms are presented for estimating the rotation. Rakesh Kumar [31] argues that the rotation 
and translation constraints, when used separately, axe very weak constraints and hence even small 
errors in the rotation stage become amplified into large errors in the translation stage, particularly 
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when the landmark distances from the camera are large. So he suggests solving for both the 
rotation and translation matrices simultaneously to achieve better noise immunity. Haralick et 
al [22] summarize the various cases of the position estimation problem using point data. They 
argue for robust estimation procedures in machine vision. Their thesis is that the least square 
estimators can be made robust under blunders by converting the estimation procedure to an 
iterative, reweighted least squares, where the weight for each observation depends on the residual 
error and its redundancy number. 

Most of the landmark- based approaches considered above suffer from the disadvantages of 
: 1) assuming the availability of landmarks in the scene around the robot; 2) depending on 
the visibility and the ability to recognize these landmarks from the image and to estimate the 
range/attitude to them from the current location; 3) requiring an approximate location to start 
with to check for the landmarks; and 4) needing a database of landmarks in the area to look for 
in the image. 


2.2 Trajectory integration and dead reckoning 


Another class of techniques estimate the position and pose of a mobile robot by integrating over its 
trajectory and dead reckoning i.e., the robot maintains an estimate of its current location and pose 
at all times and, as it moves along, updates the estimate by dead reckoning using sensors to sense 
the environment local to the new position. Features are detected from the sensory observations 
in one of the positions of the robot and are used to form the world model. As the robot moves, 
these features are again detected, and correspondence is established between the new and the old 
features. Usually the motion of the robot is known to a certain degree of accuracy as given by 
its positional sensors. These motion estimates are then used to predict the occurrence of the new 
positions for the features in the world model. The prediction is then used as an aid to limit the 
search space and to establish a correspondence between the detected features and those already 
in the current world model. After the positional sensors are used to establish correspondence, 
the motion parameters of the robot between the old and the new positions can be solved for 
explicitly. The solution provides a much more accurate positional estimate of the robot. The loop 
then continues, and the world model is continuously updated. If new features, which do not exist 
in the current world model, are detected, a mechanism to consistently update the world model is 
also provided. The type of sensing used is typically stereo triangulation or other visual sensing 
[41,42,37,46]. Crowley [10] uses a ring of 24 sonar sensors for a similar paradigm. 

Moravec’s Cart [41] was one of the first attempts at autonomous mobile robot navigation 
using a stereo pair of cameras. He defined an interest operator to locate the features in a given 
image. A coarse to fine correlation strategy was used to establish correspondence between the 
features selected by the interest operator between different frames. Matthies and Shafer [37] 
argue that a 3-D Gaussian distribution is a much more effective way to explicitly deal with stereo 
triangulation errors and errors due to the images’ limited resolution. They detail a method to 
estimate the 3-D Gaussian error distribution parameters (mean and co- variance) from the stereo 
pair of images. They then present a method to consistently update the robot position, explicitly 
taking into account the Gaussian error distribution of the feature points and motion parameters 
and their error co- variances. A Kalman filter approach is used to recursively update the robot 
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position from the detected features and the previously maintained world model. They assume the 
correspondence problem to be solved, and show by simulation data and experimental results that 
the Gaussian error model results in a more accurate stereo navigational paradigm. 

Faugeras and Ayache [17,16,3] also address the problem of autonomous navigation. Their 
approach is more rigorous and mathematical. They use trinocular 6tereo to detect object features 
and, the primitives they use for the 3-D world model are line segments. They propose a paradigm 
to combine coherently visual information obtained at different places to build a 3-D representation 
of the world. The end result is a representation of the environment by a number of uncertain 
3-D line segments, attached to co-ordinate frames and related by uncertain rigid motion. The 
measurements are combined in the presence of these uncertainties by using the Extended Kalman 
Filtering technique. Crowley [10] has a similar approach to Ayache and Faugeras [17]; he also uses 
a line segment based representation of the free space using Extended Kalman filtering techniques 
for dealing with error covariances. However, he uses a circular ring of 24 Polaroid ultra-sonic 
sensors, while Faugeras and Ayache use trinocular stereo. Chatila and Laumond [8] present a 
world modeling and position referencing system on their mobile robot HILARE. They take a 
multisensor approach of using a laser range finder for measuring depth and optical shaft encoders 
on the drive wheel axis for the trajectory integration. 

In these type of techniques the position estimation strategy depends the representation used 
for the sensory observations and the environment. Miller [40] presents a surface representation 
for real world indoor robots, equipped with a ranging device like a sonar and the robot odometry 
as sensors. He assumes the world to consist of a flat, open plane on which walls and obstacles 
are placed. Since the robot is limited to motion on the plane of the floor, the projection of walls 
and obstacles on the plane of the floor captures all the relevant world information. The basic 
unit of the spatial representation system is the map, composed of linked regions. Regions have a 
local coordinate frame. Walls and obstacles are themselves represented by line segments, whose 
end point positions are designated by coordinates in the frame of the region. The borders of the 
regions are marked with labels that specify the adjoining regions. Regions can be of four types, 
0-F, 1-F, 2-F and 3-F, since a floor dwelling mobile robot has 3 degrees of freedom, 2 translational 
( x and y ), and one rotational ( orientation 9 ). A type designation of j-F means that a sensor 
(here a sonar range sensor) can be used to eliminate j degrees of freedom. Having set the mapping 
scheme, Miller then presents methods for positional estimation as a heuristic search paradigm. 
The type of the region in which the robot is in determines the amount of positional information 
that can be calculated. If the robot is in a 0-F region, then the only positional information 
available would be extrapolations from the last known position, based on the robot’s ability to 
do dead reckoning. If the robot is known to be in a region that is 1-F or greater, then positional 
information can be found by taking several sensor readings and conducting a heuristic search over 
the tree of possible matches between the observations and the edges in the map. Once a consistent 
matching between the observations and the map edges is found, a simple geometric construction 
is performed to pinpoint the robot’s location and orientation. 

Moravec and Elfes [14,43] use a grid based representation called “Certainty Grids” for map- 
ping the environment a mobile robot will inhabit. The basic idea is to represent the floor by a 
rectangular grid and to store the information about the occupancy of different portions of the 
floor on this grid as probability distributions using a 6onar range 6ensor input. The authors also 
develop and present a fast algorithm for relating two maps of the same area to determine relative 
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displacement, angle, and goodness of the match. These can then be used to estimate the position 
and pose of the robot. 


2.3 Techniques using a Standard Pattern 


One other method of estimating the position and pose of the mobile robot accurately is to place 
standard patterns in known locations in the environment. Once the robot images these patterns 
and detects them, the position of the robot can be estimated from the known location of the 
pattern and its geometry. The pattern itself is designed to yield a wealth of geometric information 
when transformed under the perspective projection. Ambiguous interpretations are avoided, and 
a minimum of a priori knowledge about the camera is desirable. These methods are particularly 
useful in those applications where a high degree of accuracy in the positioning of the robot is 
required only after it is near a particular work station. Simple trajectory integration systems 
could be used to locate the robot near the work station, and, then, by identifying the mark ( 
standard pattern ) located near the work station, the robot can be positioned more accurately. 
Different researchers [19,9,36,12,26] have used different kinds of patterns or marks, and the 
geometry of the method and the associated techniques for position estimation vary accordingly. 


2.4 Model based approaches 


Some researchers consider the problem of the position estimation of a mobile robot when a priori 
information is available about the environment in which the robot is to navigate. This could be 
provided in terms of a CAD model of the building (or a floor map etc.) in the case of an indoor 
mobile robot, or a Digital Elevation Map (DEM) in the case of an outdoor robot. In these cases, 
the positional estimation techniques used take on a different flavor. The basic idea is, of course, 
to sense the environment using on board sensors on the robot and to try to match these sensory 
observations to the preloaded world model to arrive at an estimate of the position and pose of 
the robot with a reduced uncertainty. 

One problem with such an approach is that the sensor readings and the world model could 
be in different forms. For instance, given a CAD model of the building and a visual camera, the 
problem is to match the 3-D descriptions in the CAD model to the 2-D visual images. This is 
the problem addressed by Kak et al. [27] in their work. They present PSEIKI, a system that 
uses evidential reasoning in a hierarchical framework for image interpretation. They discuss how 
the PSEIKI system can be used for self-location by a mobile robot and how their approach is 
used with navigational system of the autonomous mobile robot PETER. The robot’s position 
encoders are used to maintain an approximate estimate of its position and heading at each point. 
However, to account for errors in the quantization effects of the encoders and slippage of the 
wheels, a visual sensor, in conjunction with a CAD model of the building is used to derive a more 
accurate estimate of the position and pose of the robot. The basic idea is that the approximate 
position from the encoders is used to generate, from the CAD model, an estimated visual scene 
that would be seen. This scene is then matched against the actual scene viewed by the camera. 
Once the matches are established between the features of the two images (expected and actual), 
the position of the robot can be estimated with a reduced uncertainty. Tsubouchi and Yuta [61] 
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discuss the position estimation techniques, used in their YAMABICO robot, which use a color 
camera and a map of the building in which the robot is to navigate. The authors propose a 
vision system using image and map information for a mobile robot with consideration of real time 
requirements. This system consists of three operations. The first operation is the abstraction of 
a specified image from a tv camera. The image is processed and highly abstracted information 
called the real perspective information is generated. The second operation is the generation of 
the estimated perspective information by co-ordinate transformation and map information, using 
information about the robot’s position and direction. The third operation is the establishment 
of the correspondence between the two perspectives. The authors use color images in their real 
perspective views. They argue for color images saying that they are invariant under lightness and 
shadow. From the color images the authors extract regions of similar color and fit trapezoids 
to these regions. From the map information trapezoids are also extracted and, in the matching 
process, these trapezoids from the two sources are used as matching primitives. 
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3 Position estimation in a mountainous terrain 


In this section a solution to the positional estimation problem is discussed for an autonomous land 
vehicle navigating in an unstructured mountainous terrain. A Digital Elevation Map (DEM) of 
the area in which the robot is to navigate is assumed to be given. More details are given in [54] 
and [55]. The robot is also assumed to be equipped with a camera that can be panned and tilted, 
and a device to measure the elevation of the robot above the datum. The robot is also assumed 
to have a compass and knowledge of the direction in which the DEM is gridded. No recognizable 
landmarks are assumed to be present in the environment in which the robot navigates. The 
DEM is a 3-D database. It records the terrain elevations for ground positions at regularly spaced 
intervals. The images recorded by the camera are 2-D intensity images. The problem is to match 
the 2-D images to the 3-D DEM. The solution presented makes use of the DEM information and 
structures the problem as a heuristic search in the DEM for the possible robot location. The 
shape and position of the horizon line in the image plane and the known camera geometry of the 
perspective projection are used as parameters to search the DEM for the possible camera(robot) 
location. Heuristics based upon from geometric constraints are used to reduce the search space. 
The algorithm makes use of the computer graphics rendering techniques to generate synthetic 
image data from the DEM to generate a solution to the problem. More importantly, the images 
generated from the DEM are used to disambiguate the robot’6 location from the various possible 
ones returned by the search process. 


3.1 The Approach 


The basic approach is to use the height and the exact shape of the horizon line ( HL) and the 
known camera geometry of the perspective projection to search in the DEM for the possible 
camera location. The actual search is a two step process. The first step is a coarser search which 
reduces the possible locations to a smaller set using the height of the HL in the image plane in 
different directions; and the second step refines this estimate using the exact shape of the HL. 

From the current robot position, images are taken in the four geographic directions: N,S,E 
and W. In generating the four geographic views, the tilt angle of the camera is adjusted until the 
horizon line is clearly visible in the image. This tilt angle fa(i = N,S,E,W ) is then measured. 
The approximate height H of the camera above the datum (with respect to which the DEM is 
specified) is assumed to be known. The height of HL at the center of the image plane in each 
of these four images is measured. This can be done using image processing techniques. Let this 
height be hi(i = N , S, E, W). The reason for using the height of the HL at the center of the image 
plane is that the DEM is assumed to be gridded in the same directions as those from which the 
images are taken. So the points that project onto the HL at the center all lie along the same 
grid line in the DEM. Using the approximate height of the camera H, the tilt angle fa and the 
HL height h, in one of the directions e.g., north, the DEM is searched for the possible camera 
locations. That is, a camera location is hypothesized at each of the DEM grid points and the 
height of the HL hi is back-projected onto the DEM using the camera geometry to see if any 
elevation points that can project to this height. If any such points exist, the camera location is 
marked as a possible candidate. Equation 3.1 below gives the estimated height Z et t for a distance 
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x from the hypothesized camera location. 
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(3.1) 


However, if any grid point is of a height larger than that estimated by the back-projection, 
then all the camera locations between the current location and this tall point are marked as 
impossible positions and discarded. This heuristic reduces the search space significantly. Similar 
heuristics are also used to prune the search space by the camera height H. The results of the 
search process by using one of the images is thus a sparser set of possible camera locations . 
These are then considered as the possible set for the next search, which searches among this set 
with the geometric constraints extracted from the image along another direction. This process 
is continued by successively applying the constraints in all the four directions and the search 
refines the possible locations to a small set usually clustered around the actual location. For a 
mountainous area, the heuristics used help to prune the search space significantly, since a large 
number of points are discarded as being impossible camera locations. 


3.2 The Derivation of Z eai 

An expression for the estimated elevation Z et t at a point in the elevation map, for a hypothesized 
camera location, and for a known imaging geometry shown in Figure 3.1 is derived below: 

T _ Image Plane Height 

■I'm ax = 2 

/ = Focal length of the camera 
0 = Perspective angle 
h = HLC height at the center of the image 
<f> = Camera tilt angle 
H = Height of the camera above the datum 
x = Distance between CAMERA and POINT 

Z eM t = H + Z\ + Z 2 

Z\ = x sin <j> 

f\ Ima t 

tan 6 = 


(3.2) 

(3.3) 
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Figure 3.1: The geometry of the projection 
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So finally from equations (3.2), (3.3) and (3.4) 
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The search algorithm 

The actual algorithm search algorithm is presented below. The DEM is referred to as a 2-D 
grid along x and y with elevation values defined at each of the grid points. The algorithm has 
two pointers, one pointing to the current hypothesized camera location, CAMERA, and the other 
pointing to a candidate grid point, POINT. The idea is to assume that the CAMERA is at a 
certain point (x,y). Search along this y line to see if any POINT exists which will project at the 
desired height hi onto the HL. If so mark this CAMERA as a possible camera position. Repeat 
the procedure for all the CAMERA positions along this y line and then for all y lines. 

Let CAMERA.X and CAMERA.Y denote the x and y grid values of the location CAMERA 
and let POINT. X and POINT.Y denote the values of the candidate point. Consider the search 
along one of the directions, say north, (see Figure 3.2). CAMERA.X is initialized to XMAX; 
POINT.X to XMIN; and CAMERA.Y and POINT.Y to YMIN. Using the known approximate 
camera height H and the tilt angle the height of the HL, h n, is back projected to the 
POINT using the geometry shown in Figure 3.1. The elevation Z eit necessary for this POINT 
to project on to the HL is estimated using Equation 3.1. The actual height Z ac t ua i (POINT. Z) 
is extracted from the DEM database. If Z actxia \ is equal to Z e3t , then CAMERA is saved as a 
possible camera position. If Z actua i is less than Z tt u then the POINT is updated to a position 
closer to the CAMERA(in this case, POINT.X is incremented).This process is continued until 
POINT coincides with CAMERA. Then POINT is re-initialized, and CAMERA is updated by 
moving it closer to POINT (in this case, CAMERA.X is decremented). If at some stage Z ac(ua i 
is greater than Z es t , then CAMERA is not a possible location since if it were the elevation at 
POINT, Zactuai would project a height higher than h jv- It can also be said that none of the 
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Figure 3.2: The search in the north direction 

points between CAMERA and POINT are possible camera locations since at all these, the same 
thing would have happened. This heuristic proves very useful in pruning the search space by 
a large extent for mountainous terrains with many altitude variations like those considered in 
the illustrations. The search process is then repeated along all the y lines in this direction, i.e., 
for CAMERA.Y and POINT.Y ranging from YMIN to YMAX. The possible camera positions 
returned by this search process are then considered as inputs for the next search, which searches 
among this set with geometric constraints extracted from the image along another direction. The 
process is continued successively by applying the constraints in all the four directions and the 
search refines the possible locations to a small set usually clustered along the actual location, (see 
Figure 3.10). 

Another heuristic used to reduce the search space is to use the approximate height H of the 
camera. Only those points of the DEM are considered as possible camera locations where the 
elevation Z octua / lies close to H, i.e., H — 6h < Z actua i < H + &H, where 6jj is the resolution of 
the sensor used to measure the camera’s height. 

The first stage of the search algorithm is formalized below for a search in the north direction. 
It is similar for the other directions except that the search is carried out only in the reduced set 
of candidate camera positions returned by the previous search process. The direction and hence 
the limits of the search vary depending on the direction in which the image is taken. 


3.3 The Algorithm Search 

Stage 1 of the search process for searching in the north direction is described below. 
Input : The DEM, hs, 4 >n and H. 

Output : A list of possible camera locations on the stack S. 
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1 Make stack S empty. 

2 POINT.Y = CAMERA .Y = YMIN. 

for (CAMERA.Y < YMAX) do 

3 POINT.X = XMIN. 

CAMERA.X = XMAX. 

for (POINT.X < CAMERA.X) do 

4 Z t ,t = get-z-estimate(/i;v,<^v, 

H, CAMERA, POINT) /* computed using equation 1 */ 
Z actual = POINT. Z. /* retrieved from the DEM */ 


5 if ( Z eat = Z ac t U ai ) Push(S, CAMERA). /* save current camera location */ 


else lf(^ac<ua( ^ Z e) i) 

Pop(S). /* discard the previously stored location and update the camera position 

V 

CAMERA.X = POINT.X. 

POINT.X = XMIN - 1. 

6 increment POINT.X 
endo 

7 decrement CAMERA.X 
endo 


End Algorithm Search. 

Stage 2 of the search process is used to further isolate the exact location from the possible 
ones returned by the stage 1 search. Each of these locations is considered as a possible candidate 
and the image that would be seen if the camera were located at that location is generated from 
the DEM using computer graphics rendering techniques. The HLs from these images are then 
extracted and correlated against the actual image HL to arrive at a measure of their disparity. 
The camera location corresponding to the location with the lowest disparity is considered as the 
best estimate of the location. The exact location can be further isolated by generating the images 
from the points neighboring to the estimated location and checking to obtain a zero error measure. 


3.4 The error analysis 


The search algorithm depends upon the errors in three parameters: the error in height H of the 
camera, the HL height h, in the images plane, and the tilt angle of the camera <f>{. These three 
parameters affect the estimated elevation, Z ei t , at a particular candidate point (POINT) for a 
given hypothesized camera position (CAMERA). The errors in Z eat directly reflect as errors in 
the positional estimation, since in step 5 of the algorithm, only when Z e ,t is equal to Z actua i is 
the current camera considered location as a possible candidate. So in the rest of the section, the 
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Figure 3.3: Z eti vs x for different ft, 

effects of errors in these three parameters on the computation of Z es t are analyzed and then the 
measures taken in the algorithm to account for the worst case errors are discussed. 

The equation relating Z e ,t to ft, ,<f>, and H, is given by ( 1 ) above. For a given size of the image 
(2/max, 2/ max ) and angle of perspective projection, 0, 7 ^ is a constant. So the variables are H, 
fc„ and <f>i. 


Expanding using Taylor’s series and neglecting the higher order terms, 


A Z e ,t * A H • 


dH 


+ A ft, • 


dZest 

dhi 


+ A <f>i- 


dZest 

d<j>i 


(3.5) 


Errors in H 


Equation 2 shows that the error in H is directly reflected as an error in Z est . So to account 
for these a worst case error in H is estimated from the sensor used to measure H. Let this be SH . 
In the search algorithm, this can be accounted for by considering the estimated elevation at the 
candidate point to be acceptable if Z est - 6H < Z actua i < Z est + 6H . 

Errors in ft, 

The errors in hi are mainly due to image quantization error and errors in the edge detector 
used to extract the horizon line. Since x, the distance between POINT and CAMERA, occurs 
as a multiplicative parameter in the second and third terms of Equation 1 , the errors in ft, are 
magnified by this and, hence, vary depending on x. The worst error compensation should take 
this into account. Figure 3.3 shows a typical plot of Z ei t versus x for different A ft;. As can be 
seen with increasing distance, the error in Z e> t grows almost linearly. Figure 3.4 shows a plot of 
A Z est versus A ft, for different <p,s and for fixed x and H . Ideally, A Z eti = 0 should occur at A ft, 
= 0 . However, due to image quantization error, this occurs for A ft, € (— 1 ,+ 1 ), as can be seen 
for different values of <p t . One way to account for these errors in ft,- is to back-project a band of 
ft, values , (ft, — A ft,, ft, + A ft;), instead of a single ft, in the estimation of Ze»t- Thus, for each 
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Figure 3.4: Z e »t vs A h for different fa 

CAMERA and POINT locations a range of acceptable elevations ( Z e $t + A Z et t, Z e , t - A Z ejt ) are 
obtained and CAMERA is considered as a possible camera location only if Z ac tual at this POINT 
lies within this range. This way the effect of x on A Z e »t is also implicitly accounted for. 

Errors in fa 


Similar to h,, the effects of errors in fa are also magnified by x and, hence depend on the 
distance between CAMERA and POINT. The algorithm is very sensitive to effects in errors in 
fa. As Figure 3.5 shows, an error of 0.5 degrees in fa causes an error of about 50 meters in Z es t 
for the given x, H , and h,. So, fa has to be measured accurately, or at least a good worst case 
estimate is required so that as before, Z e ,t can be compensated for. In the actual implementation 
of the algorithm, errors in fa are accounted for by considering the effect of these errors as errors in 
hi. That is, the band hi ± A hi is made wider to account for Afa. Hence, the range of acceptable 
Z es t values is increased. In the illustrations considered, it is found that a Ah, = 3 pixels accounts 
for 0.5 degrees error in fa and for a ±1 pixel error in h,. 

Another observation from Figure 3.5 is that AZ e $t = 0 does not always occur at Afa = 0. 
This is due to the roundoff errors in the computation of Z et t from fa,H ,hi, and x , which is a 
reasonably complicated trigonometric expression. 


3.5 Illustrations 

The algorithm is tested on a digital elevation map (DEM) of a mountainous area in Colorado 
obtained from the United States Geological Survey (USGS). The elevation data is a uniform 
square grid of 30m resolution and has 359 x 457 grid points. It covers an area of 148 square kms. 
Synthetic images for an assumed camera location are generated using the AT & T Pixel machine 
from the DEM. The elevation data is tessellated into polygons and surface normals are calculated 
at each of the elevation points using the four neighboring points. A light source position and 
direction are assumed and for a given camera location and perspective geometry, Gouraud shaded 


16 




Figure 3.5: Z e$ t vs A <f> for different h, 


polygons are drawn and projected on to the image plane to generate perspective views of the 
mountain range. Figure 3.6 - Figure 3.12 illustrate a typical run of the algorithm. Figure 3.6 
shows a typical picture used to test the algorithm. Figure 3.7 shows the HL extracted from this 
using a gradient operator from this image. Figure 3.9 shows the results of the first stage of the 
search process. Of the possible 164063 (359 x 457) possible locations, stage 1 of the search process 
using hx and the associated camera geometry returns 473 points as possible camera locations. 
These successively drop in the consecutive searches using hi in the other directions. Ultimately, 
there are about 8 possible camera locations as shown in Figure 3.10. Figure 3.11 and Figure 3.12 
show the results of the stage 2 of the search process. The figures show the HLs of some of the 
images generated in the north direction for the candidate camera locations returned by the first 
stage of the search process. A measure of the difference between these and the actual image HL 
in this direction is also shown. The measure used is a mean square error measure between the two 
contours. The candidate with the lowest error in this case 48.3442 units is then considered as the 
estimate of the camera location. The actual position is (200.1,100.5) and the estimated location 
is (201,102). The four neighbors of this point are then considered as candidates; images are also 
generated in the north direction for these positions. The HLs are then extracted and correlated 
against the original HL, and the best estimate of the location is isolated as the one with the least 
mean square error. 

Table 3.1 shows the results of running the algorithm at various locations on the DEM. As 
can be seen, in most cases the estimated position is quite close to the actual location. Zero mean 
Gaussian noise with a standard deviation of 5 is added to the horizon lines. This is used to 
represent the noise in the image formation process and the noise in the detection of the HL. Since 
the HLs are smoothed by a Gaussian filter and since actually a band of hi values is used in the 
back projection, the effects of the additive Gaussian noise are not felt much. As the test runs 
illustrate, the effect of this noise is to slightly shift the estimated position in some cases. However, 
the estimated position is still quite close to the actual location is all cases. So, the method is 
quite robust to errors in the imaging process. 
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Figure 3.6: A typical view 



Figure 3.7: The extracted HLC 



Figure 3.8: An aerial view of the DEM 
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Figure 3.9: The possible camera locations after searching in one direction (aerial view) 



Figure 3.10: The possible camera locations after searching in all four directions (aerial view) 



Figure 3.11: The disparity between HLCs 
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Figure 3.12: The disparity between HLCs 


Table 3.1: Search Results 


(1) 

(2) 

(3) 

(4) 

(5) ..... 

(150.2,100.8) 

342 

■ 1 

(151,101) 

(151,102) 

(150.0,200.0) 

126 

3 

(151,200) 

(151,201) 

(150.3,300.6) 

227 

8 

(150,301) 

(152,303) 

(200.1,100.5) 

473 

8 

(201,102) 

(202,100) 

(200.2,200.2) 

889 

S3 

(201,201) 

(200,202) 

(200.2,300.8) 

334 

9 

(201,300) 

(202,302) 

(300.0,100.0) 

334 

■1 

KTiWBlM 

(301,101) 

(300.5,200.8) 

273 

17 

(301,200) 

(302,202) 

(300.2,300.8) 

254 

10 

(300,300) 

(301,302) 
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4 Positional Estimation in a man-made environment 


In this section we address the navigational issues of a mobile robot navigating in an urban envi- 
ronment. The environment of the robot is assumed to consist of polyhedral buildings. The 3-D 
descriptions of the roof tops of the buildings are assumed to be given. Such descriptions could be 
obtained from the architectural plans of the buildings or from a stereo pair of aerial images. The 
robot is assumed to be equipped with a single visual camera that can be panned and tilted. The 
robot is not assumed to have the ability to recognize any landmarks and that a good estimate 
of the position and pose of the robot is not available. It is also shown that if such a capability 
exists it only helps in aiding the position estimation process presented. In this research a new 
method of representing the free space of the robot is presented. The free space is partitioned 
into distinct, non- overlapping regions called the Edge Visibility Regions (EVRs) using the stored 
model description of the environment. These regions have the property of capturing the geometric 
relations between the features (edges) in the model with respect to their visibility from different 
positions in the viewing plane. The idea is that the the problem of establishing the correspondence 
between the model features and the image features is alleviated by using these EVRs as a guide. 
A procedure for partitioning the free space into EVRs is described in this section. The use of this 
EVR representation of the free space in position estimation and path-planning are discussed in 
the future work. A derivation for an upper bound on the maximum number of EVRs that would 
be generated for a given model description in also given. Details are given in [56] and [58]. 


4.1 Edge Visibility Regions 


This subsection presents an algorithm to divide the free space of the robot into the EVRs. The 
environment of the robot is assumed to consist of polyhedral buildings and the 3-D descriptions 
of the rooftops, of which are assumed to be given as the world model. 

Let OXYZ be the world co-ordinate system (WCS) in which the world model is described. 
The robot is assumed to be equipped with a camera that can be panned and tilted with no roll. Let 
O’X’Y’Z’ be the robot (camera) centered co-ordinate system (CCS). See Figure 4.1. In order to 
estimate the position and pose of the robot in the world co-ordinate system, the transformation 
matrix T that transforms WCS into CCS needs to be solved. In general, this matrix has six 
degrees of freedom: three rotational ip,<t>,8 and three translational X,Y,Z, Hence T is a function 
of However, in the present problem since it is assumed that 1) the roll angle, 

= 0, 2) the tilt angle <f> is measurable ( a constant ), and 3) the robot is on the ground plane 
Z = 0, there are only three degrees of freedom for this matrix: two translational, X and Y and 
one rotational, 8. So, the robot navigates in the YY-plane of the world co-ordinate system and 
has an orientation, 8 , which is a negative rotation about the £-axis. So T(X,Y,8) transforms the 
WCS into the CCS. 

Initially, we consider the restrictive case of all the buildings as having flat, convex rooftops, 
and of being the same height. We then develop an algorithm to generate the EVRs and their 
associated VLs. The extensions of this algorithm to the general case are then discussed. In the 
former case, only the projections of the buildings onto the XY-plane need to be considered in 
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Figure 4.1: The world and robot co-ordinate systems 

forming the EVRs and the VLs, since the tilt angle <f> is assumed to be measurable. 

The problem can now be formally stated as follows: 

To partition a plane containing m, n-sided non-intersecting convex polygons into distinct, 
non-overlapping regions called the Edge Visibility Regions{ EVRs). These EVRs are characterized 
by the following properties: 


• associated with each EVR is a list of edges of the polygons that are visible in that region 
called as the visibility list(V L). 

• no two adjacent EVRs have the same VL. 


An edge / of a polygon is considered to be visible from a point p if there is at least one point 
q € / such that the line segment pq is not intersected by any other edge of any other polygon. 

Note that this problem is similar in spirit to the problem of computing the weak visibility 
polygon considered in computational geometry [48,59]. Another problem bearing similarities to 
this problem is that of forming the aspect graph of a polyhedral object for object recognition 
[25,51]. However, the above methods concern mainly a single polyhedral object and divide the 
3-D view space into regions based on the visibility of faces, edges, vertices, etc. of the object. 
The present method considers multiple objects and their occlusions. The objects considered are 
non-overlapping, 2-D, convex polygons, and the viewing space is a 2-D plane. 

The algorithm partition that divides the XY plane into the desired EVRs, along with their 
associated VLs, is presented below. The algorithm UBes three subprocesses called split, project, 
and merge. The basic idea of the algorithm is to start with the entire .YY-plane as one EVR with 
a NULL visibility list. Each polygon is considered in turn by extending each of its edges, and the 
EVRs that are intersected are divided into two new ones. The new EVRs then replace the old one 
and the VLs of the new EVRs are updated to account for the visibility of this edge by considering 
the edge to be visible in one half-plane, say left of the edge, and invisible in the other. This is 
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Figure 4.2: The EVRs after Split 

handled by the split process. For each new polygon considered, the mutual occlusion of the edges 
of this polygon with the other existing polygons is handled by forming the shadow region of these 
edges on the other existing polygons. This is handled by the project process. Finally the merge 
process concatenates all the adjacent EVRs with identical VLs into one EVR. After partitioning 
the AV-plane into EVRs, for each model edge in the VL of an EVR the interval of orientations 
of the robot for which this model edge is visible in the EVR is also computed and stored. An 
efficient method to compute this interval is also described. 

Split 


Given a 2-D convex polygon in a plane and a set of existing EVRs, this process is used to 
update the EVR list to account for the visibility of the edges of the polygon. Consider an n-sided 
convex polygon P in the viewing plane defined as a collection of n vertices, t>i, v 2 , and n 
edges, vjt> 2 , v 2 v 3 , ....v n _it> n , v n vi, such that no pair of non-consecutive edges shares a point. Then, 


1. extend each of the edges of P, vju 2 ,...,t? n Vi, in order and for each of these lines, and 

2. check if this line cuts an existing EVR in the current EVR list. If it does, split the EVR into 
two along this line, copy the VL of the EVR into the two new EVRs, update the EVR list 
by replacing the old EVR by the two newly formed ones, and update the VL of the region 
to the left of this line by adding this edge, r,r,+i, to its VL. 


The above procedure also returns the polygon itself as one of the EVRs. This is then removed 
from the EVR list by checking the list to see if any of the EVRs are the intersecting the polygon. 
Figure 4.2 shows the EVRs and their VLs after extending all the edges of P. 

Project 

This section discusses a process called project, which is used to account for the mutual 
occlusions between lines and polygons that affect their visibility from different locations in the 
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Figure 4.3: The shadow region for two lines 

viewing plane. The process is first described for two line segments, and then for a line segment 
and a polygon. 

Case 1: Consider the case of two lines, /1/2 and mim2, in a plane as shown in Figure 4.3. We 
now need to find the region in the plane where /1/2 is not visible due to occlusion from m\m 2 - 
Note that mi m2 lies to the left of (the visible side of) the directed line segment /1/2. Let us refer 
to this region as the shadow region of mi m2 due to /1/2. One way to find this shadow region is to 
extend the line l\Tn\ and then to find the intersection of this line with the viewing plane, as m^\ 
similarly, extend l 2 m 2 and find m 3 . Now the region is the desired shadow 7 region. 

The lines mi 7714 and m2 m3 are referred to as the shadow lines of mi m2 due to /i/2- 

Case 2: Consider the case of a line segment /1/2 and a convex polygon P in a plane. We need 
to find the shadow region of P due to the line /i/ 2 when P lies on the visible side of /i/ 2 . This 
region is the union of the shadow 7 regions of each edge v,v,+ 1 of P due to /x/2* An easier way to 
compute this shadow region is to find the two vertices of P, x and y, such that the shadow 7 region 
of the line segment XY due to lil 2 is the desired shadow region of P due to /1/2- The two vertices 
x and y are referred to as the shadow vertices of P due to /1/2. There are two sub cases to be 
considered: 


• subcase 1 : In this subcase the entire polygon lies to the left of the line /1/2* That is, the 
extension of the line does not cut any of the edges of the polygon. Figure 4.4 shows this. 

• subcase 2 : In this subcase, only part of the polygon lies to the left of the line. That is, the 
extension of the line cuts some of the edges of the polygon. Figure 4.5 shows this. 


The determination of the shadow vertices is different in each of these subcases. So, the 
process project detects these subcases and uses a different method in each. For subcase 1, shown 
in Figure 4.4, the shadow vertices are found as follows: the vertex of P corresponding to the 
max of the angle Z/ 2 /i^, * = l,2,....n, gives the shadow vertex x, and similarly the vertex of P 
corresponding to the maximum of Z/i^Vj, t = l,2,....n, gives the shadow vertex y. 
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Figure 4.4: The shadow regions for subcase 1 



For subcase 2, shown in Figure 4.5, it is slightly more involved to find x and y. First, the 
convex hull of the points Vi, v 2 , v 3i —v n and / 2 is formed. Then the two vertices of the convex 
hull adjacent to / 2 , v\ & V 3 , are considered. Of these, the vertex to the left of the line /j / 2 is 
considered to be the shadow vertex y and the other to be x. Hence, x = t>i and y = V4. To 
find the shadow region, the line lix is extended and its intersection with the plane is found as 
a; similarly intersection for the line l^y is found as b. Thus, the shadow lines are xa and yb and 
the shadow region is xaby. Once the shadow region is formed, the edge Ji/ 2 , numbered as 5 in 
the Figure 4.5, is marked as invisible and, hence, removed from the VLs of all the EVRs that lie 
inside this shadow region. Note that in subcase 2, all of the shadow region does not lie in the 
visible side of the line lih- As a result we have adjacent EVRs with identical VLs. However, the 
Merge process, to be discussed next, accounts for this. 

Merge 

Given an EVR list and the associated VLs, this process is used to search the list for adjacent 
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EVRs that have identical VLs. Since the EVRs are actually convex polygons, two EVRs axe 
considered as adjacent if they have one edge common. These EVRs are then merged into a single 
EVR, and the EVR list is updated if they have identical VLs. This step is run iteratively, each 
time merging the adjacent EVRs and forming new ones until no two adjacent EVRs have identical 
VLs. 


4.2 The Algorithm Partition 


The algorithm partition i6 described below, which given a list of convex polygons and a viewing 
plane, divides the viewing plane into EVRs and forms their associated VLs. The algorithm 
initially assumes the entire viewing plane to be one EVR and then iteratively splits this plane 
into sub-regions, considering each of the polygons in turn using the process Split. Whenever a 
new polygon is considered, not only are the existing EVRs split to account for the visibility of 
its edges, but the occlusions of the edges of this polygon due to all the other existing polygons 
in the plane are also considered using the Project process. The Merge step is run after each 
Project step to merge adjacent EVRs that have the same VLs. As described before, the routine 
Split(P,List), takes a polygon P and an EVR list List , and modifies the EVRs in List and their 
associated VLs depending on the visibility of the polygon P’s edges. The Project(e, P,List) 
routine takes in an edge e , a polygon P, and an EVR list List and modifies the List and the 
associated VLs to take into account the occlusion of e due to the edges of P. The Merge(List) 
routine takes an EVR list, List, merges the adjacent EVRs having the same VLs, and returns the 
modified List. 


Algorithm Partition 

Input : A set of convex polygons in a plane. 

Output : A set of EVRs and their associated VLs. 


1. Initialize 

1.1 Initialize the EVRJLIST to contain one region, 

the entire plane 

1.2 Set the VL of this region to NULL. 

1.3 Initialize the USED-POLYGONS list to NULL. 

2. For ( i = 1 to i < Nojof -ALL-POLYGONS ) 

do 

2.1 Set CURRENT-POLYGON = 

ALL-POLYGON S{i) 

2.2 Call Split(CU RRENT JPOLY GON, EVR.LIST) 

2.3 For ( j = 1 to j < No-of-U SED-POLYGON S ) 

do 

2.3.1 For ( k = 1 to k < No_oLedgesjof( 

CURRENT-POLYGON) ) 
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Figure 4.6: The EVRs of two convex polygons 


do 

2.3. 1.1 Call Project( 

EDGEjof (CURRENT POLYGON, k ), 
USEDJ>OLYGONS(j ), EVRJ.IST) 

2.3. 1.2 Call Merge(EV RJLIST) 
endo 

2.3.2 For ( k = 1 to k < No_oLedges_of( 

USEDJ>OLYGONS(j)) ) 

do 

2.3.2. 1 Call Project( 

EDGE jo f(U SED.POLYGON S(j),k), 
CU RRENT .POLYGON, EVRMST) 

2.3. 2.2 Call Merge(EVR_LIST ) 
endo 

endo 

2.4 U SEDSOLYGON S( 

No.of.US ED POLYGONS) 

= CU RRENT -POLYGON 

2.5 Increment Nojof JJ SED POLYGON S 
endo 

3. Exit 


Figure 4.6 shows the EVRs and their VLs for the case of two convex polygons. 


4.3 The Estimation of the interval of orientations 

For each model edge in the VL of an EVR the interval of orientations of the robot for which this 
model edge is visible in the EVR is also computed and stored. An efficient method to compute 
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Figure 4.7: The shadow regions for buildings of unequal height 

the interval is given below. Given an EVR and a model edge ei in the VL of the EVR, we need 
to find the lower bound, 0 mtn , and the upper bound, 0 max , of the orientation angles of the robot 
(camera) for which this edge e, is visible in the EVR. Let the vertices of the EVR be r,, t = 1, ..n. 
Let pi, P 2 be the end points of e,\ Then it can be easily shown that 0 m , n is the minimum of the 
angles made by the lines joining P 2 to v<(i = 1, ..n) and that 6 max is the maximum of the lines 
joining pi to V{ , i = 1, ..n. Hence, given an EVR and a model edge e* in its VL, the maximum and 
the minimum orientation angles can be estimated by considering just the vertices of the EVR. 


4.4 Generalizations 


In the case of the buildings with rooftops that are not convex, the non-convex polygons repre- 
senting the projections of the rooftops on to the XF-plane are decomposed into a set of adjacent, 
component convex polygons [28,48]. Only decompositions without Steiner points are considered, 
since the modifications to the existing algorithms in this case are straightforward. The extra 
edges that are added in the process are considered as dummy edges, and their visibility is not 
marked in the VLs of the EVRs. Hence, the self occlusions of the edges of a non-convex polygon 
are handled by decomposing the polygon into component convex polygons and dealing with their 
mutual occlusions by using the project process. In the case when all the buildings are not of the 
same height, it is insufficient to just consider the projections of the rooftops onto the XF-plane 
alone when forming the shadow regions. The project proce66 is modified to consider a Z- shadow 
line also. Figure 4.7 illustrates this case. Here the shadow region of line a on to the polygon P 
is, hence, the region bounded by L,M and N , where N i6 the Z-shadow line. Note that the case 
of all the buildings of the same height is actually a special case of unequal height buildings where 
the Z-shadow line is at infinity. In the case when the rooftop of a building is not flat (planar), it 
is decomposed into convex planar polygons and each of these is considered as a separate polygon; 
the partition algorithm is then modified as before, in the non-convex case. Also, the Z-shadow 
lines are drawn to account for these component polygons, which are now convex and flat but of 
different heights. 
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4.5 Estimating the Number of EVRs 


An interesting and important question related to using this method is how many EVRs would be 
generated using the partition algorithm. If this number is very large, it would be impractical to 
use the method. One might think that the number of distinct EVRa grows exponentially with m 
and n the number of polygons and the number of sides of each polygon, respectively. We derive 
an upper bound on the maximum number of EVRs that would be generated and show that this 
is polynomial in m and n, 0(n 2 m 4 ). The derivation is based on induction and is similar to that 
outlined by Ikeuchi and Kanade [25] in the context of aspect graphs for object recognition. 

Consider the case when there are k n-sided convex polygons in a plane. Let the total number 
of lines in the plane be given by L(k). We have 


L(k ) = L(k - 1) + n + 2 • n ■ (k - 1) 


+ 2 • n • (k — 1) 


(4.1) 


where the first term indicates the number of lines already existing when ( k — 1) polygons have 
been considered; the second term indicates the number of lines drawn by extending the edges 
of the kth polygon; the third term accounts for the shadow lines drawn from the n edges of the 
A:th polygon onto the other k - 1 already existing polygons; and finally the fourth term accounts 
for the shadow lines drawn from each of the n edges of the existing k — 1 polygons onto the kth 
polygon. Recall that each edge of a polygon contributes two shadow lines for each of the existing 
polygons lying on the visible side of the edge. Solving (1) we have, 


L(k) = kn + 4n[(k - 1) + (fc — 2) + 1] 

L(k ) = kn(2k — 1). 

Let T(m) denote the maximum number of EVRs generated when we have m n-sided, convex 
polygons in a plane under perspective projection. Note that T(0) = 1. Consider the situation 
when there are m - 1 polygons already existing and in the plane. The maximum number of lines 
in the plane is L(m - 1) = n(m — l)(2m — 3). Now when we consider the mth polygon, we 
need to perform three steps which increase the number of lines and hence the EVRs. First, each 
edge of the mth polygon is extended, which gives n lines. Second, for each of the n edges of this 
mth polygon, we draw two shadow lines onto each of the other m — 1 polygons already in the 
plane; this gives 2n(m — 1) lines. Finally, for each of the n edges of the m — 1 polygons, we draw 
two shadow lines onto the mth polygon, which also gives 2n(m — 1) lines. So, we add a total of 
n + 4n(m - 1) new lines by adding the mth polygon. 

If we assume that each of these n(4m - 3) lines is added successively, the number of EVRs 
grows after each new line is added. We can derive the maximum number of EVRs by induction. 
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Let T(m)k denote the number of EVRs after the fcth line is drawn. Since we started with T(m- 1) 
EVRs and L(m — 1) lines, after the first additional line is drawn, the number of EVRs, T(m)i, is 


T(m) i = T(m — 1) + L(m — 1) + 1 


since this line is cut by the existing L(m — 1) lines into L(m — 1) + 1 segments (maximal case). 
These will divide L(m - 1) + 1 regions into two, thus adding L(m — 1) + 1 new EVRs. Proceeding 
in this fashion, 


r(m) 2 

■^(^)n(4m- 3) 


T(m) j + L(m — 1) + 2 
T(m) = T(m) n ( 4m _ 3 )_| 


+ L(m — 1) + n(4m - 3) 

T(m) — T(m - 1) + n(4m — 3 )L(m — 1) 


n(4m— 3) 

+ E * 

i=i 

= T(m - 1) + [,,(<ro ~ 3)1 ‘ + ^ 7 3 > 
2 2 

+ n(4m - 3)[n(m — l)(2m — 3)] 

= T(m — 1) + 0(n 2 m 3 ) + 0(n 2 m 2 ) 


+ 0{mn ) 

= 0(n 2 m 4 ) 


It is worth pointing out that although the maximum number of regions suggested by this 
derivation is of 0(n 2 m 4 ), in practice it is very unlikely that so many EVRs will be generated. 
First, the derivation assumes that whenever a new line is drawn it cuts all the existing lines in 
the plane, which very rarely happens. Second the shadow lines are considered to be drawn to all 
the existing polygons at any given time. In practice, however, we need to draw the shadow lines 
only onto the polygons that lie on the visible side of the edge. Since we are considering convex 
polygons, the situation of all the polygons lying on the visible side of all the edges is an impossible 
case. Also, the Merge process is quite effective and reduces the number of regions by a significant 
amount, particularly as more more and more polygons are considered. Taking all these factors 
into account, we find the number of regions to be much smaller than 0(n 2 m 4 ). However, this 
upper bound serves the useful purpose of showing that the number of EVRs is, in the worst case, 
still polynomial in n and m. 
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4.6 Positional Estimation and Path Planning 


Having developed a methodology to partition the free space in the into EVRs, we now propose 
a procedure to estimate the position and pose of the robot in the environment. Also, we discuss 
the use of the EVRs in path-planning is discussed. 


4.6.1 Positional Estimation 


As discussed in before, the position of the robot is given by three parameters, (X,Y, 6). These 
can be obtained by solving the transformation matrix T(X,Y,6), which, transforms the WCS into 
the CCS. This matrix can be solved for by establishing a correspondence between the features in 
the WCS and the CCS. The features we plan to use are the line segments that constitute the roof 
tops of the buildings, since the 3-D descriptions of these are assumed to be given in the world 
model. 

To establish the correspondence between the world model features and their images, one 
possibility is to formulate the problem as a search paradigm [21], i.e., to form an interpretation 
tree of all the possible pairings between the model and the image features and to search this 
interpretation tree for a set of consistent pairings which can then be used to Bolve the transform 
T(X,Y, 0). However, since the 3-D description of the model features are given, by using the 
geometric relations between them, and the known perspective geometry of the camera, we can 
formulate constraints as to which features will be visible from different places in the XY -plane. 
The interpretation tree can then be pruned using these constraints so as not to consider all 
the possible pairings. Recall that the EVRs 6tore all the relevant geometric information about 
the model features, like, the number of features visible from a given region and the interval of 
orientations of the robot for which these features are visible. So given a set of images features 
the EVRs can be used to prune the interpretation tree in establishing a consistent set of pairings 
between the image and model features and, hence, the position and pose of the robot can be 
computed accurately from this set of matches. 

Usually the robot position and pose, as given by its position encoders, is available, which 
essentially isolates the robot position to lie within a few EVRs. So only these EVRs need to be 
considered in searching for the exact location Also if the robot can identify a landmark (one of 
the 3-D features), this ability can be used to isolate the robot position to lie only within those 
EVRs that contain this landmark edge in their VLs. Similarly, if a rough estimate of the 6 value 
is available, it can be used to quickly prune some of the hypotheses. 


4.6.2 Path Planning 


Planning paths from the start to the goad and executing them is an important navigational task to 
be performed by all autonomous mobile robots [49]. The EVRs are a convex region representation 
of the free space. Representing the free space by convex regions has the advantage that any path 
chosen inside an EVR will be entirely within the EVR and, hence, will be obstacle free. So, to 
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plan an obstacle free path between a start position S and a goal position G, the free space is 
represented as a graph, with the EVRs as the nodes and an arc existing between two EVRs, if 
they are adjacent. If 5 is in EVRj and G is in EVRj , then an obstacle free path between S and 
G can be found by a graph search for a connected component list between EVRj and EVRj. If 
the arcs are weighted by the distance metric or some other function to be optimized, the graph 
search can be used to yield the minimum distance path. Also, since each EVR has a VL of the 
world model edges, this can be used to establish the robot’s position in each EVR accurately at 
every desired instant and, hence, the path can be followed more precisely. 
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5 Conclusions 


In this report we have presented the work done under the NASA Grant NAG 9-361. The problem 
of estimating the position and pose of an autonomous mobile robot in an outdoor environment 
is considered in this research. A comprehensive review of the various positional estimation tech- 
niques studied in the literature is first presented. Then positional estimation techniques for an 
outdoor mobile robot in two different kinds of environments are described; a mountainous natural 
terrain and an urban, man-made environment with polyhedral buildings. In both these cases the 
robot is assumed to be equipped with a single visual camera that can be panned and tilted and 
also a 3-D description (world model) of the environment is given. This world model is in the form 
of Digital Elevation Map (DEM) for the natural terrain case. For the urban environment, it is 
assumed that the 3-D descriptions of the roof tops of the buildings are given. Such a description 
could be obtained from the architectural plans of the buildings or from a stereo pair of aerial 
images. Techniques for positional estimation using the camera input and the world model are 
presented. 

In the natural terrain case the Horizon Line Contour(HLC) extracted from the camera im- 
ages is used as a distinctive feature to search the DEM for the position of the robot. Various 
heuristics drawn from the geometric constraints are used to reduce the search space. In the urban 
environment, a new intermediate representation of the environment of a mobile robot called the 
Edge Visibility Regions^VRs) is developed. This representation can be formed offline from a 
given stored model description of the environment. The uses of representing the environment as 
EVRs in alleviating the problem of establishing a correspondence between the sensor readings and 
the stored model for the various navigational tasks of the mobile robot like positional estimation 
and path planning are discussed. 
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